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Derived variants at six genes explain nearly half 
of size reduction in dog breeds 
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Selective breeding of dogs by humans has generated extraordinary diversity in body size. A number of multibreed 
analyses have been undertaken to identify the genetic basis of this diversity. We analyzed four loci discovered in a pre- 
vious genome-wide association study that used 60,968 SNPs to identify size-associated genomic intervals, which were too 
large to assign causative roles to genes. First, we performed fine-mapping to define critical intervals that included the 
candidate genes GHR, HMGA2, SMAD2, and STC2, identifying five highly associated markers at the four loci. We hypothesize 
that three of the variants are likely to be causative. We then genotyped each marker, together with previously reported 
size-associated variants in the IGFl and IGFIR genes, on a panel of 500 domestic dogs from 93 breeds, and identified the 
ancestral allele by genotyping the same markers on 30 wild canids. We observed that the derived alleles at all markers 
correlated with reduced body size, and smaller dogs are more likely to carry derived alleles at multiple markers. However, 
breeds are not generally fixed at all markers; multiple combinations of genotypes are found within most breeds. Finally, we 
show that 46%-52.5% of the variance in body size of dog breeds can be explained by seven markers in proximity to 
exceptional candidate genes. Among breeds with standard weights <41 kg [90 lb], the genotypes accounted for 64.3% of 
variance in weight. This work advances our understanding of mammalian growth by describing genetic contributions to 
canine size determination in non-giant dog breeds. 

[Supplemental material is available for this article.] 



Domestic dogs exhibit the greatest diversity in body size of any 
land mammal. Mastiffs can be 50 times heavier than Chihuahuas, 
and Great Danes five times taller than Pekingese. Dog breeds are all 
descended from the gray wolf (Wayne 1993; Lindblad-Toh et al. 
2005) and are the product of artificial selection that began between 
15,000 and 100,000 yr ago (Vila et al. 1997; Sablin and Khlopachev 
2002; Savolainen et al. 2002; Germonpre et al. 2009; Pang et al. 
2009; Ovodov et al. 2011). However, the majority of the modern 
dog breeds were developed within the past 300 yr (American 
Kennel Club 1998; Parker et al. 2004). More than 400 breeds now 
exist worldwide, including 175 that are recognized in the United 
States by the American Kennel Club (AKC; www.akc.org). 

Modern domestic dog breeds are codified by standards, which 
apply persistent selective pressure on fixed phenotypes that 
are often breed defining, such as coat color, skull shape, leg 
length, and body size. This pressure reduces phenotypic and ge- 
netic heterogeneity within breeds, yet enormous phenotypic di- 
versity exists across breeds (Parker et al. 2004, 2007; vonHoldt 
et al. 2010). These factors, along with the genetic isolation of 
breeds, have established domestic dog breeds as an excellent ge- 
netic system for the study of complex traits, including skeletal 
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size and shape variation (Chase et al. 2002; Shearin and Ostrander 
2010). 

Loci determining size have strong signatures of selection 
(Akey et al. 2010; Boyko et al. 2010; Vaysse et al. 2011). The first 
association studies of canine body size found an influential locus 
in spite of sparse marker density (Chase et al. 2002; Jones et al. 
2008). Chase et al. (2002) used genotypes at -500 microsatellites 
to analyze the genetic basis for canid morphological variation in 
Portuguese water dogs, a breed with significant variation in skeletal 
size (Chase et al. 2002), and identified multiple quantitative trait 
loci (QTLs) related to canine body size. A locus on canine chro- 
mosome 15 (CFA15) was observed to be highly associated with 
measures of skeletal size. Further investigation by our collaborative 
group led to the identification of a single haplotype composed 
of 20 single-nucleotide polymorphisms (SNPs) that was shared 
among all small breeds (<9 kg [20 lb]), but was nearly absent from 
giant breeds (>30 kg [66 lb]) (Sutter et al. 2007). The haplotype 
spans the insulin-like growth factor 1 (IGFl) gene, which is known 
to regulate skeletal size in both mice and humans (Baker et al. 1993; 
Woods et al. 1996). 

A subsequent study by Jones et al. (2008) extended these 
findings and pioneered the use of breed-defined phenotypes 
("stereotypes") to identify associated markers, a method which is 
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also used in the present study. Jones et al. (2008) tested the asso- 
ciation of genotypes in 2801 dogs representing 147 breeds at 1536 
SNPs with several breed stereotypes including weight, limb length, 
and height. They identified several new body size loci as well as 
replicating findings from previous studies (Chase et al. 2002, 
2005); thus further supporting the use of breed standard measures, 
rather than individual measurements on each dog, in genetic 
studies of canine morphology. 

Subsequent studies performed by our collaborative group on 
a much larger data set of 915 dogs from 80 breeds genotyped using 
60,968 SNPs (the "CanMap project") highlighted a number of 
phenotype-associated loci (Boyko et al. 2010). Among these were 
loci important in body size, some of which had been previously 
identified (Chase et al. 2002, 2005; Jones et al. 2008). Associations 
at four of the size-associated loci were replicated in data released by 
a subsequent study of 509 dogs from 46 breeds genotyped with 
170,000 SNPs (Vaysse et al. 2011). Finally the CanMap data set was 
used by Hoopes et al. (2012) to identify a new dog body size locus 
on CFA3 at the insulin-like growth factor 1 receptor gene {IGF IK). 

Here we describe the combinatorial effects of genetic varia- 
tion at six loci on determining body size in dog breeds. At four 
autosomal loci previously found to be associated with canine body 
size (Boyko et al. 2010), the critical intervals resulting from our 
fine-mapping revealed excellent candidate genes, including growth 
hormone receptor (GHR), high mobility group AT-hook 2 (HMGA2), 
stanniocalcin 2 {STC2), and SMAD family member 2 {SMAD2). We 
genotyped the most highly associated marker(s) at each locus, to- 
gether with highly associated markers from the IGFl and IGFIR 
genes in a large set of dogs representing the entire range of canine 
body size. The resulting analysis shows that approximately half of 
the variance of the weights of dog breeds can be explained by 
polymorphisms at just these six loci. 

Results 

We fine-mapped four body size QTLs identified in a previous ge- 
nome-wide association study (GWAS) (Boyko et al. 2010). Initial 
critical intervals were selected based on association scores in the 
CanMap study at the following positions in CanFamS.l coordi- 
nates: CFAIO (8,454,499, P= 7.06 X 10"°^), CFA4 (39,200,720, P = 
9.10 X 10"°^ and 67,026,055, P = 2.58 X 10"^^, and CFA7 
(43,865,905, P= 1.05 X 10"^^). 

Standard breed weight (SB W) was used as a surrogate for body 
size, as has been done previously (Boyko et al. 2010). Specifically, 
when a weight was specified as part of an AKC breed standard, that 
value was used as the SBW for each dog of the breed in the data set. 
For breeds with no specified weight, values from other authorities 
were used (Methods; Supplemental Table 1). Where a range or 
different weights for male and female were given, an average was 
used. Since the phenotypic basis of this study is the standard 
weights of AKC breeds, which are specified and widely referred to 
in lb units, results are reported in lb as well as kg. 

Fine-mapping tlie size loci 

Fine-mapping of the four autosomal loci validated the scan asso- 
ciations and revealed critical intervals that include the excellent 
candidate genes GHR, HMGA2, STC2, and SMAD2 (see Supple- 
mental Results, Supplemental Figs. 1-3, and Supplemental Tables 
2-5 for details on the fine-mapping experiments). The most highly 
associated variants at each locus were two nonsynonymous SNPs 
in GHR, one SNP in the 5' UTR of HMGA2, one SNP 20-kb down- 



stream from STC2, and one deletion 24-kb downstream from 
SMAD 2 (Fig. lA-D; Table 1). Here we refer to each variant by the 
name of the proximal gene. The two nonsynonymous SNPs in 
GHR are termed GHP(l) and GHP(2). 

Frequency of derived alleles at size-associated markers 
in 500 dogs 

In order to determine the effective contributions of variants in or 
around IGFl, IGFIR, GHR, HMGA2, SMAD2, and STC2 on body 
size, the allele frequencies of tagging markers for each locus were 
determined from a large, physically diverse set of dogs representing 
93 breeds. 

We added previously described and highly associated markers 
at IGFl (Sutter et al. 2007; Gray et al. 2010) and IGFIR (Hoopes 
et al. 2012) to the panel of size-associated markers identified by 
fine-mapping, for a total of seven markers (Table 1). Of note, a SNP 
(CFA15:41,221,438) and a SINE insertion (CFA15:41,220,980) in 
intron 2 of IGFl were genotyped on DNA from 500 dogs and found 
to be in complete LD, which is consistent with previous reports 
(Sutter et al. 2007; Gray et al. 2010). Consequently, all future 
references to the IGFl variant refer to the SNP, but the conclu- 
sions apply to the SINE element as well. The IGFIR SNP marker 
(CFA3:41, 849,479) codes for a missense mutation, as we described 
previously (Hoopes et al. 2012). 

We genotyped DNA from 500 dogs, representing 93 AKC- 
recognized breeds, at each of the seven markers (genotyping results 
are in Supplemental Table 6). Breeds span the entire range of canine 
weights. All dogs are unrelated at the grandparent level, and at least 
two males and two females were genotyped from each breed. 

To determine the ancestral allele for each marker, we geno- 
typed a set of wild canids, including 26 geographically diverse gray 
wolves, two red wolves, and two coyotes. The genotypes in the red 
wolves and coyotes were all homozygous, defining the ancestral 
alleles (Table 1; Supplemental Table 7). In gray wolves, the ances- 
tral alleles greatly predominated (Supplemental Table 7). 

The SBWs of dogs with different genotypes were compared 
(Fig. 2). To ensure that no single breed was overrepresented, we 
randomly selected only two males and two females from each 
breed for this analysis. 

Genotypes at each marker corresponded to differences in size. 
Reflecting the similarity of size between larger dogs and gray 
wolves, the ancestral alleles of each variant were always those more 
commonly found in larger dogs. For each variant, SBWs of dogs 
homozygous for the derived allele (D/D) were significantly less 
than SBWs of dogs homozygous for the ancestral allele (A/A). 
Moreover, SBWs of D/D dogs were also significantly less than the 
SBWs of heterozygotes (A/D) at four of seven markers (Fig. 2). 

When comparing across loci, we observed similar trends. At 
all loci except IGFl, the mean SBW of the D/D dogs was 4-7 kg (8- 
15 lb). For most pairs of loci, the SBWs of dogs homozygous for the 
derived allele at one locus had a distribution similar to the SBWs of 
dogs homozygous for the derived allele at each of the other loci 
(boxplots) (Fig. 2A). However, dogs that were homozygous for the 
derived allele at IGFl had a greater size range and a higher mean 
SBW (9.8 kg [21.6 lb]) than D/D dogs at any other locus (Fig. 2B). 

The relationship of D/D and A/D dogs was more complicated 
at HMGA2, IGFIR, and GHR{2), in part because fewer heterozygotes 
were observed. HMGA2 was the most extreme, with only 16 A/D 
dogs and 87 D/D dogs (Fig. 2B). This ratio (16:87) was smaller than 
that observed at any other locus. By comparison, the small number 
of heterozygotes at IGFIR and GHR{2) was due in part to the low 
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Figure 1 . Fine-mapping of four loci associated with canine body size. (A-D) Regional plots of the four fine-mapped loci: CFA4:67 Mb (A), CFA1 0:8 Mb 
(B), CFA4:39 Mb (C), and CFA7:43 Mb (D). Each plot includes the following tracks, from fop to bottom: P-values of the genotyped SNPs in the CanMap 
data set (Boyko et al. 201 0) (with coordinates updated to CanFam 3.1 genome assembly); the regions of the genome covered during fine-mapping (green 
and blue; amplicons for marker discovery and SNP positions for SNPlex, respectively); genes (orange; see Methods for identifiers); and the most highly 
associated marker(s) identified in each region (red). 



frequencies of the derived alleles (7.5% and 7.3%, respectively); 
which were found almost exclusively in the smallest breeds. Dogs 
with the D/D genotype at IGFIR or GHR{2) had a breed mean 
weight of 4-4.5 kg (9-10 lb), which was consistent with our pre- 
viously reported findings (Hoopes et al. 2012). The frequency of 
genotypes did not differ between male and female dogs at any of 
the loci (no P- value <0.6). 

Allelic trends among dogs of similar weights 

A step4ike pattern was apparent in the allele frequencies found in 
5-lb bins (2.3 kg) (Fig. 3). Overall, as body size decreased the derived 
allele frequency increased, as did the number of markers with de- 
rived alleles. Considering each variant separately (Fig. 3, columns), 



in most cases allele frequencies changed gradually across body 
sizes, as represented by the gradient from yellow to red. By com- 
parison, the incidence of the HMGA2 derived allele dropped 
abruptly in dogs with an SBW of 4.5-9.1 kg (10-20 lb). 

While most derived alleles are observed in smaller breeds, the 
IGFl derived allele is observed surprisingly frequently in several 
larger breeds. Notably, nine of the 10 Rottweilers (the only breed in 
the data set between 45.4 and 47.6 kg [100 and 105 lb]) were ho- 
mozygous D/D at IGFl. 

More typically, all dogs >40.8 kg (90 lb) were either homo- 
zygous for ancestral alleles at all markers or carried derived alleles at 
only one marker, usually /GFi. Among dogs <1 1.3 kg (25 lb), 90% 
carried derived alleles at three or more markers, and 98% carried 
the derived allele at IGFl . 
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Table 1. Size-associated marl<ers 
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The mean SBWs (±SD) of dogs were calculated for the dogs selected in Figure 2. Newly mapped markers are in the first five rows. (STG2) stanniocalcin 2; 
(GHR) growth hormone receptor; {SMAD2) SMAD family member 2; {HMGA2) high mobility group AT-hook 2; (IGFl R) insulin-like growth factor 1 
receptor; (IGFl) insulin-like growth factor 1 . 
^Hoopes etal. (2012). 
^Sutter etal. (2007). 



Combinations of genotypes we recorded the markers at which the dog carried the derived allele 

(A/D or D/D). While 128 possible combinations exist, only 39 were 
Many allelic combinations were observed when all seven markers observed in this data set (Fig. 4A). Thirteen combinations were 
were considered. To define the combination present in a given dog, common, as defined by their presence in 10 or more dogs. In the 
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Figure 2. Body size is tightly regulated in dogs homozygous for the derived alleles. (A) The standard breed weight (SBW) of each dog (/-axis) is plotted 
by genotype at each marker (x-axis). The SBWs of dogs homozygous for the derived allele (D/D) at the IGFl marker are significantly smaller than dogs that 
are heterozygous (A/D) or homozygous for the ancestral allele (A/ A), as determined by Kolmogorov-Smirnov and Mann-Whitney-Wilcoxon tests. (***) P < 
0.001 . The distribution of SBWs for a given genotype/marker combination is generally less for homozygous D/D dogs than for other genotypes (the 
median and first and third quartiles are indicated by the boxplots). Statistics for each genotype/marker combination are summarized in B. SBWs of 
genotype classes are reported as mean ± SD. Two females and two males were randomly selected from each breed for this analysis. The SBWs of all 
selected dogs are plotted in the leftmost column. Points were randomly scattered on the x-axis within each column to facilitate visualization. 
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Figure 3. Derived allele frequencies increase at multiple loci as body weight decreases. The frequency of the derived allele in 5-lb weight classes is 
represented on a color scale. The smallest dogs (bottom row) are consistently red at all markers except IGF1R, while the largest dogs rarely carry a derived 
allele, as observed in weight classes of 90-95 (40.8-43.1 kg), 95-100 (43.1-45.4 kg), and above 105 (47.6 kg). The high frequency of the ICFl derived 
allele in the 1 00-1 05 class represents the only breed we tested in the class. Rottweilers. Dogs with an SBW above 1 05 lb are collapsed in a single category 
due to the lack of genotype variation in the group at these markers. This analysis includes all 500 dogs genotyped. 



most frequent combination^ both alleles at every marker were 
ancestral. This combination was observed most frequently in 
large breeds (Fig. 4B), but was also noted infrequently in breeds 
with an SBW as low as 15.9 kg (35 lb) (Supplemental Table 1). 
Combinations were generally not breed-specific. Of the 31 com- 
binations that occur more than once, only two are limited to 
a single breed. Rare combinations of alleles were also identified. 
For instance, eight combinations were found in only one dog 
each, suggesting that other low-frequency combinations exist in 
the population at large. 

There is one set of combinations that is unlikely to exist in 
any dog. Of four possible haplotypes at the two nonsynonymous 
GHR markers (GC, GT, AC, and AT), only three were observed. The 
missing haplotype contains the allele associated with large dogs at 
GHR{1) and the allele associated with tiny dogs at a marker 41 bases 
away, GHR{2). In essence, we found haplotypes corresponding to 
"large + not tiny" (GC), "small + not tiny'' (AC), and "small + tiny'' 
(AT), but never "large + tiny" (GT). Since the GHR{2) marker T allele 
occurs at very low frequency and the two markers are in close 
proximity, we believe the GT haplotype is unlikely to exist in the 
general population. This suggests that selection of the GHR{2) 
derived variant occurred among dogs that were already carriers of 
the GHR(l) derived allele. 

In one of the few widely observed combinations, a derived 
allele was present only at the IGFl locus. Dogs presenting this 



combination belonged to two broad categories of breeds. The first 
group contained breeds with SBWs <31.8 kg (70 lb) and included 
Basenjis, English Springer Spaniels, and American Staffordshire 
Terriers (with SBWs of 10.2, 20.4, and 29.0 kg, respectively [22.5, 
45, and 64 lb]). The second were breeds with SBWs >40.8 kg (90 lb) 
and included Mastiffs and related breeds such as Tibetan Mastiffs, 
Bullmastiffs, Dogues de Bordeaux, Rottweilers, and Black Russian 
Terriers (Fig. 4). 

The mean SBW of dogs with a given combination was calcu- 
lated (Fig. 4). As expected, the combination with the lowest mean 
SBW had derived alleles at all markers, and the heaviest combina- 
tion had no derived alleles. In some cases, breeds that vary sub- 
stantially in size shared the same combination, such as Papillons, 
Boston Terriers, and Border Collies (2.8, 7.9, and 17.9 kg, respec- 
tively [6.1, 17.5, and 39.5 lb]). Nevertheless, the standard deviation 
of SBWs for dogs sharing a combination was generally lower than 
that observed for weight groups defined by genotypes at a single 
marker (Fig. 2), indicating that combinations of genotypes explain 
body size differences better than any single genotype. 

Unifying model 

Since derived allele frequencies among the seven profiled markers 
clearly corresponded to progressive diminution, we sought to 
quantify how well alleles at these markers accounted for differ- 
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Figure 4. Multiple combinations of genotypes are observed in most breeds. We assessed combinations of genotypes in individual dogs (A). The 
presence of a derived allele (whether heterozygous or homozygous) is indicated by a filled square. The first column represents the combination with 
derived genotypes at each marker; the mean weight of dogs with this combination is less than the mean weight of any other combination. The percent 
standard deviations for a given combination are typically smaller than the percent standard deviations of dogs sharing only a genotype at a single marker 
(which are reported in Fig. 2). The combinations observed in each breed are uniquely identified by the pairing of fill and outline color in B. Breeds are sorted 
by SBW. This analysis includes all 500 dogs genotyped. 



ences in body size. We used breed-averaged allele frequencies to 
calculate the proportion of phenotypic variance that these seven 
markers explained in a linear model. In order to determine which 
components should be present in the model, we first tested the 
mode of inheritance for each marker. We found that both HMGA2 
(P = 0.0094) and GHR{2) (P = 0.0366) have a significant dominance 
component, consistent with the log of the mean SBW of hetero- 
zygotes deviating from the mean of the homozygotes (Supple- 
mental Fig. S4). 

The resulting model related the log-transformed SBW to the 
allele frequencies at each of the seven size-associated markers and 
the breed-average of the dominance component for HMGA2 and 
GHR(2). Derived allele frequencies at each marker accounted for 
86.0% of SBW variance for the 93 breeds, as measured by the ad- 
justed R-squared (Fig. 5A). The terms corresponding to all markers 
except GHR{2) were significant by ANOVA (P < 0.05). 

Because dog breeds do not represent a randomly mating 
population, we investigated the role of population structure in the 
explanatory power of the allele frequencies. In order to use them 
to correct for population structure, we calculated breed-averaged 
principal components (PCs) from genome-wide SNP profiles for 



each of the 65 breeds that were present in both our data set and the 
CanMap data set (Boyko et al. 2010). We then compared the SBW 
variance explained with and without terms representing PCs, us- 
ing only PCs that are significantly predictive of SBW and the 65 
breeds that have PCs. Genotypes alone account for 85.8% of var- 
iance; PCs alone, 44.2%; and PCs and genotypes together, 90.0%. 
These variances are not additive, but rather they indicate the upper 
limit that each could contribute in our model. Taken together, our 
uncorrected and population-corrected models show that geno- 
types at these seven loci account for between 45.8% and 85.8% of 
SBW variance. 

As an alternate approach to accounting for population 
structure, corrected SBW (cSBW) values were determined by taking 
the residuals of a linear regression of SBW on PCs. Allele frequen- 
cies would then explain 52.5% of the variance in the resulting 
cSBWs (Fig. 5B). The two numbers, 46% and 52.5%, bracket a 
conservative estimate of the variance of SBW explained by geno- 
types at these markers. 

The seven markers are less informative in large and giant dog 
breeds. Allele frequencies accounted for 64.3% of cSBW variance in 
dogs with SBWs <40.8 kg (90 lb), but only 8.4% of cSBW among 
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Figure 5. Allele frequencies at size markers explain 86% of size variation before correction for population structure {A) and 52.5% after (5). {A) A linear 
model was generated to assess the power of breed-averaged allele frequencies to explain variance in standard breed weights (SBWs). SBWs in lb (in 
parentheses) were transformed by natural log to approximate a normal distribution as was done in previous studies (Boyko et al. 2010). The black line 
indicates perfect equality of the fitted values with the SBWs. The cluster of breeds with a fitted weight of 90 lb (40.8 kg) reflects the lack of informativeness 
of these loci for large breeds. Small amounts of scatter (<0.05) were added to plotted values to reduce overplotting (n = 93). (S) A correction for population 
structure was performed by regressing the SBW on breed-averaged, genome-wide principal components (PCs). More than half (52.5%) of the variance in 
the residuals of this regression, the corrected SBWs (cSBWs), was explained by allele frequencies at the seven size markers. Since PCs were calculated from 
the CanMap data set, cSBWs could only be calculated for the 65 breeds that were present in both our data set and the CanMap data set. 



dogs with SBW >40.8 kg. This is reflected in the cluster of fitted 
values around 90 lb in Figure 5, A and B. These points represent 
breeds that were homozygous for the ancestral allele at all markers 
and reflect the lack of relevance of these markers to differences in 
size among large and giant breeds. 

The genotype-phenotype relationships were subjected to 
further analysis. We found no significant interactions between 
markers (Supplemental Results). We also found that the model is 
unlikely to be overfitting the data, since 42. 1% of cSBW variance in 
a test set could be accounted for using coefficients calculated by 
a training set. The comparable number in the full 65-breed cSBW 
data set is 52.5%. The applicability of our findings to individuals 
was also assessed. Tests with 124 individually weighed dogs showed 
that 74.4% of the variance of their uncorrected weight could be 
accounted for using individual allele frequencies with coefficients 
derived from calculations based on uncorrected SBW, in which 
86% of SBW variance was explained with allele frequencies aver- 
aged by breeds. The cross-validation and the model's ability to 
explain size variance in individuals underscore the substantial 
nature of the effects we describe. 

Discussion 

We have identified the source of approximately half of size varia- 
tion in domestic dog breeds by genotyping DNA from 500 dogs at 
seven markers, five of which we identified by fine-mapping, and 
two of which we identified previously (Sutter et al. 2007; Hoopes 
et al. 2012). The dog breeds we analyzed were selected to represent 
the full range of canine body size, and by analyzing the relation- 
ship of the standard breed weight with genotype, the underlying 
pattern was revealed: For each variant, the derived allele corre- 
sponds to reduced body size relative to the ancestral gray wolf, and 
the presence of derived alleles at multiple variants further reduces 
body size. In a linear model, allele frequencies account for —86% 
of variance in SBW without correction and, conservatively, 46%- 
52.5% after correcting for population structure, a degree of ex- 



planatory power rarely seen in genetic studies. This strong statis- 
tical relationship of genotypes with phenotype is compelling evi- 
dence of functional effects by variants in LD with these markers, if 
not the markers themselves. 

Modern dog breeds are defined by rigorous standards, which 
describe the ideal representatives of the breed. For genetic studies 
of these strongly selected traits, the fixed phenotype can be used as 
a proxy for the individual's genetically determined phenotype, as 
we (Sutter et al. 2007, 2008; Jones et al. 2008; Boyko et al. 2010) 
and others (Vaysse et al. 2011) have done previously, and as we 
have done here. By leveraging AKC breed standards and averaged 
measurements from registered dogs, we can reduce the effect of 
environment, thus targeting genes underlying strongly selected 
traits, which often reflect the defining features of a breed. In this 
study, the approach resulted in the identification of variants under 
strong selection by breeders that correspond to major differences 
in overall breed body size. By virtue of our study design, intrabreed 
body size variation is discounted and genes that contribute ex- 
clusively to it will not be identified. Indeed, given that the par- 
ticipants in our study are mostly show animals that compete for 
breed standard conformation titles, we expect the genetic contri- 
bution of intrabreed size variants to be minor compared with those 
that are the major contributors to interbreed size differences. 

The starting point of this study was our earlier multibreed 
GWAS, which used breed standard weight to identify QTLs asso- 
ciated with dog body size variation (Boyko et al. 2010), a study 
which found that six size-associated SNP chip markers explained 
72% of variance of SBWs without correction for population struc- 
ture. Our fine-mapping experiments were designed to identify the 
most highly associated and diagnostic variants. Each variant is po- 
tentially causal, with compelling cases for three of the variants: the 
two protein-altering SNPs in GHR and the SNP in the 5' UTR of 
HMGA2. The SMAD2 variant is a large deletion (9.9 kb) that appears 
to be in complete LD with a neighboring 5. 7-kb deletion. Although 
the deletions are more than 15 kb from the gene, they could po- 
tentially affect transcription efficiency, as predicted by the loss of 
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a transcription factor binding site cluster (Supplemental Fig. 3). 
The STC2 SNP is the least likely to be the causal variant, as it only 
affects a single base and is 20 kb from the gene. However, it is 
highly associated and therefore remains an excellent marker. There 
are no better-associated markers in the exons of the studied genes, 
yet the possibility remains that there are better-associated markers 
within the extensive range of regulatory effect. As high-throughput 
sequencing is applied to more dog genomes, further information 
about the potentially functional role of regions far from genes will 
be available. 

Several of the genes reported in this study are known to be 
involved in size regulation in other organisms. IGFl, IGFIR, and 
GHR participate in the GH/IGFl pathway, which is required for 
normal stature in humans. Mutations in the GH/IGFl pathway genes 
have been associated with human growth disorders (Walenkamp 
and Wit 2006; Rosenfeld et al. 2007; David et al. 2011). The in- 
terdependence of these three proteins — GHR, IGFl, and IGFlR — is 
well documented (David et al. 201 1), but we see no strong evidence 
for statistical interactions in the effects of the variants studied here. 

GHR is an attractive candidate for canine size regulation be- 
cause it is implicated in human body size (Amselem et al. 1989; 
Ayling et al. 1997) and affects IGFl signal transduction (David 
et al. 2011). Human studies suggest a mechanism by which the 
GHR variants identified here could cause reduced body size. The 
GHR SNPs selected in this study are located in the extracellular 
domain of the canine GH receptor. In the syntenic human exon, 
three disease-associated SNPs have been reported <25 amino acids 
away. These SNPs affect growth hormone binding and are believed 
to cause a human growth hormone insensitivity disorder termed 
Laron Syndrome (Wojcik et al. 1998). 

HMGA2 has been associated with height determination in 
multiple human GWAS (Weedon et al. 2007, 2008; Gudbjartsson 
et al. 2008; Lettre et al. 2008; Sanna et al. 2008; Soranzo et al. 2009; 
N'Diaye et al. 2011; Carty et al. 2012). HMGA2 is a transcription 
factor expressed during embryonic and fetal development (Rogalla 
et al. 1996; Gattas et al. 1999). Hmga2 knockout mice have a pygmy 
phenotype, characterized by reduced birth weight and growth re- 
tardation (Benson and Chada 1994; Zhou et al. 1995). 

Neither STC2 nor SMAD2 have been implicated in size de- 
termination in humans. However, STC2, a secreted glycoprotein 
hormone inhibits growth in mice independently of the GH/IGFl 
pathway (Gagliardi et al. 2005; Chang et al. 2008). Although no 
SMAD2-mediated size phenotype has been reported, it is a tran- 
scription factor known to transduce signals from members of 
the transforming growth factor beta (TGF-beta) superfamily 
(Moustakas and Heldin 2009; Wu and Hill 2009). An appealing 
possibility is that the deletion identified proximal to SMAD2 is 
acting in cis to alter this gene's expression in developmental 
processes such as myogenesis, chondrogenesis, or osteogenesis 
(Sartori et al. 2009; Song et al. 2009; Chen et al. 2012). 

While it is not surprising that genes with a conserved role 
in mammalian size determination might have variants in both 
humans and dogs, both the population structure and the study 
methods complicate comparisons. GWAS in humans have identi- 
fied 180 loci significantly associated with height (Gudbjartsson 
et al. 2008; Lettre et al. 2008; Sanna et al. 2008; Weedon et al. 2008; 
Soranzo et al. 2009; Kim et al. 2010; Lango Allen et al. 2010; 
N'Diaye et al. 2011; Carty et al. 2012). However, even together, 
these loci account for only —10% of the adult human height 
variation (Lango Allen et al. 2010), although the heritability of 
height is -80% (Silventoinen 2003; Visscher et al. 2006; Perola 
et al. 2007). Unlike our approach, which uses SBWs, human studies 



have focused on individual measurements of subpopulations, en- 
abling partitioning of variance attributable to environment and 
capturing intra-group variation. However, methodological ap- 
proaches are unlikely to explain the entire difference in study 
results (six genes explaining —50% of SEW in dogs vs. 180 loci 
explaining — 10% of individual height in humans). Dogs are under 
intense artificial selection and have a much greater range of sizes 
than humans. The relative subtlety of height regulation in humans 
may be more typical of species subjected to many thousands of 
generations of natural selection, such as wolves. We hypothesize 
that the variants of large effect in dogs that we have found are 
superimposed on a subtler size-regulation system inherited from 
wolves. 

In addition to explaining —65% of variance in dogs <40.8 kg 
(90 lb), this study defines two substantial types of body size vari- 
ation that remain to be explained: 35% of body size variation in 
dogs <40.8 kg, and —90% of the body size variation among dogs 
weighing >40.8 kg. Some of the unexplained variation in dogs 
<40.8 kg is evident in breeds like Shih Tzu and Pugs. Shih Tzu 
weigh 20% less than Pugs, but most individuals belonging to either 
breed have identical genotypes at the seven size variants studied 
here. To investigate size determination on a finer scale, individual 
dog weights and perhaps measurements will be necessary. In- 
dividual weights and measurements may also permit the elucida- 
tion of epistatic relationships, which have been observed in other 
domesticated species (Carlborg et al. 2006). 

Although typically found in small dogs, we found the IGFl 
derived allele in Rottweilers, consistent with previous reports 
(Sutter et al. 2007), and in other large mastiff-related breeds (Fig. 4). 
We offer two explanatory hypotheses. First, it is possible that 
neither of the two IGFl variants genotyped in this study (a SNP and 
a SINE insertion) are causal; rather they tag the ancestral haplotype 
on which the causal variant first emerged. Thus, some very large 
breeds could carry the tagging variants and yet lack the causal 
variant. Alternatively, epistasis of a yet-unidentified locus may 
reduce the effects of the IGFl small allele in some large dog breeds. 

The genotypes of dogs from breeds with an SEW >40.8 kg 
(90 lb), represented by 18 breeds in our study, allow us to distin- 
guish them from small and medium dogs, but not from each other 
(Fig. 5 A). The genotypes at the seven size markers account for <9% 
of differences among dogs over 40.8 kg. Clearly, other loci that 
contribute to large body size in dogs remain to be found, and 
further analysis of these giant breeds is warranted. 

Size determination in large and giant dogs probably shares 
features of size determination observed in small and medium-sized 
dogs. Several size-associated intervals on the X chromosome have 
been identified, but not studied further (Eoyko et al. 2010; Vaysse 
et al. 2011). In a predictive model that considered breeds of all 
sizes, adding the locus at 104 Mb on the X chromosome to a model 
with only the IGFl locus increased the amount of variance ex- 
plained from 47.6% to 57.8%, without correction for population 
structure (Eoyko et al. 2010). However, our ongoing efforts in- 
dicate that fine-mapping the chromosome X loci is extremely 
challenging, as LD on this chromosome extends over megabases 
(M Rimbault, unpubl.) and includes dozens of genes. 

Size determination could also be more wolf-like in large dogs 
than in small dogs. Compared with small dogs, the sizes of large 
dogs overlap more with the sizes of wolves. Wolves vary substan- 
tially in size, with the weights of adult male wolves in Yellowstone 
National Park alone ranging from 38 to 66 kg (85 to 145 lb) 
(MacNulty et al. 2009). Size determination in wolves may be more 
similar to height determination in humans than to an artificially 
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selected group like domestic dogs and result from the collective 
effects of many variants of small effect (Lango Allen et al. 2010). 

In this study, we identified markers at loci that define the 
major size ranges in domestic dogs and show how combinations of 
alleles produce the extensive range of dog sizes present in modern 
breeds. It remains to be seen how size is regulated on a finer scale, 
within breeds, between sexes, and among giant dogs. Some of 
these studies will require individual measurements and perhaps 
larger numbers to compensate for noise due to environmental ef- 
fects. It will also be valuable to extend our existing findings by 
identifying the functional consequences of size-determining var- 
iants. It is our hope that these studies can shed light on growth- 
related health issues in dogs and humans. 

Methods 

All coordinates refer to the CanFam3.1 dog genome assembly 
(Sept. 2011). Unless otherwise noted, analysis was performed using 
the software program R (R Development Core Team 2012), and 
figures were generated with R base graphics and the plotting pack- 
age ggplot2 (Wickham 2009). The genes and identification numbers 
in Figure 1 are: SEPPl (NM_001115118), GHR (NM_00 1003 123), 
NKX2-5 (NM_001010959), STC2 (ENSCAFG00000031727), SMAD2 
(ENSCAFG00000017567), MSRB3 (ENSCAFG00000029740), and 
HMGA2 (BLAT results of KC529658). 

Sample collection and DNA extraction 

Blood samples were collected from dogs belonging to AKC-registered 
breeds at AKC-sanctioned dog shows, specialty events, breed clubs, 
and veterinary clinics. Samples were collected as whole blood into 
ACD or EDTA anticoagulant tubes after obtaining written consent 
from dog owners. Genomic DNA was isolated from whole blood 
using a standard proteinase-K/phenol: chloroform extraction pro- 
tocol (Maniatis et al. 1982). All procedures were reviewed and ap- 
proved by the NHGRI Animal Care and Use Committee at the 
National Institutes of Health. 

Phenotype assignment 

Standard breed weights were obtained from several publications. 
If the AKC specified a weight for a breed, it was used (American 
Kennel Club 1998). If separate values were listed for males and 
females, those values were averaged. When the AKC did not specify 
a weight or if only an upper or lower limit was specified, we used 
data from The Encyclopedia of the Dog (Fogle 1995). If no weight was 
specified, we utilized data from Atlas of Dog Breeds of the World 
(Wilcox and Walkowicz 1995). We also considered weights re- 
corded in our NHGRI database of individual dogs. These owner- 
reported weights were collected at AKC-sanctioned dog shows, 
breed specialty events, and breed club meetings. If there were 
more than six adult dog weights listed for a breed in our database, 
we removed the maximum and minimum weights listed and 
compared the mean of the remaining weights with the published 
breed standard weight. If the weights differed by >20%, we used the 
mean breed weight from our database. A list of the breeds and of 
the standard breed weight used in this study can be found in 
Supplemental Table 1. Because the phenotype of interest is size, we 
treated the three varieties of poodles as separate breeds. 

Genotyping of the highly associated markers 

The highly size-associated markers (Table 1) were genotyped on an 
additional set of samples consisting of 500 dogs, termed the vali- 



dation set, from 93 AKC-recognized breeds representing the full 
range of canine body size (Supplemental Table 6). Dogs were un- 
related to one another at the grandparent level. Forty-one percent 
of the 500 dogs had also been included in the CanMap data set 
(Boyko et al. 2010). The validation set was not fully independent 
from this study's discovery set either: 13 dogs, six small and seven 
large, were used for both marker discovery and validation ex- 
periments. Wild canids, including 26 geographically diverse 
gray wolves from North America, Europe, and Asia (10 females 
and 16 males), two coyotes (one female and one male), and two 
red wolves (two males) were also genotyped (Supplemental 
Table 7). 

Three hundred and eighty-four dogs were genotyped at the 
following markers using a GoldenGate genotyping assay (Illu- 
mina): IGFl, IGFIR, GHR{1), and STC2. GoldenGate genotypes at 
one position, GHR{1), were all validated by Sanger sequencing with 
100% concordance. The remaining dogs and variants were geno- 
typed by PGR and Sanger sequencing (see Supplemental Methods 
for reaction conditions). The SINE insertion in intron two of 
IGFl was genotyped by PGR amplification, and PGR products 
were analyzed after migration on 1% agarose gels to determine 
the presence or absence of the insertion. To genotype the 9.9-kb 
deletion downstream from the SMAD2 gene on CFA7, PGR prod- 
ucts from two different primer pairs were analyzed on 1% agarose 
gels to determine the absence or the presence of the deletion. A 
list of the primers and PGR conditions are given in Supplemental 
Table 8. 

Model 

In all models, we used the natural log of weight in lb to approxi- 
mate a normal distribution, as was done previously (Boyko et al. 
2010). Twenty principal components were calculated on the CanMap 
data set using SmartPCA from the Eigensoft package (Patterson 
et al. 2006; Price et al. 2006). We used a pruned data set that ex- 
cludes individuals with >10% missing genotype data, SNPs in high 
LD as defined by pairwise genotypic r^ > 0.8 within sliding win- 
dows of 50 SNPs, and SNPs that were within 2 Mb of the most 
strongly size-associated markers at each of the six loci. Outliers of 
more than six J] were excluded, as were breeds with fewer than four 
dogs remaining after individual outliers were excluded. The breed 
average of each PC was calculated. Ninety-three breeds were rep- 
resented in the pool of dogs genotyped at all seven markers; PC 
values were available for 65 of those breeds (Supplemental Table 9). 
The PCs predictive of SBW (PC2, PC4, PC6, PCIO, PCll, and PC18 
at P < 0.05) were used for subsequent corrections for population 
structure. 

Significant dominance components were identified by ap- 
plying a nested ANOVA to each marker with and without a partial 
dominance term. PCs that were significant for weight were in- 
cluded in both equations. 

For the purposes of testing our model on individual dogs, we 
used individual measurements from dogs in our own database. For 
each dog, we calculated the mean and standard deviation of the 
other dogs in the breed. If a dog's Z-score relative to those numbers 
exceeded 1.5, the dog was excluded. 

Data access 

The sequence containing the first exon of canine HMGA2 has been 
submitted to the National Center for Biotechnology Information 
(NCBI) GenBank (http://www.ncbi.nlm.nih.gov/genbank) under 
accession number KC529659. The mRNA sequence of HMGA2 
has been submitted to NCBI GenBank under accession number 
KG529658. 
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